Goto

Collaborating Authors

 batting average


AI (Artificial Intelligence) Lessons From The World Series

#artificialintelligence

ARLINGTON, TEXAS - OCTOBER 27:Rays pitcher Blake Snell, second from left, comes out of the game ... [ ] against the Dodgers in the 6th inning in Game 6 of the World Series at Globe Life Field on October 27, 2020 in Arlington, Texas. I'm a lifelong Dodgers fan and I waited for 32 years for the team to win another World Series. But during this period of time, the sport has certainly seen much change. With the availability of huge amounts of data, sophisticated computers and advanced analytics, the strategies have become increasingly based on the numbers. It seems that AI (Artificial Intelligence) has dominated the decision making process.


Calculating new stats in Major League Baseball with Amazon SageMaker Amazon Web Services

#artificialintelligence

The 2019 Major League Baseball (MLB) postseason is here after an exhilarating regular season in which fans saw many exciting new developments. MLB and Amazon Web Services (AWS) teamed up to develop and deliver three new, real-time machine learning (ML) stats to MLB games: Stolen Base Success Probability, Shift Impact, and Pitcher Similarity Match-up Analysis. These features are giving fans a deeper understanding of America's pastime through Statcast AI, MLB's state-of-the-art technology for collecting massive amounts of baseball data and delivering more insights, perspectives, and context to fans in every way they're consuming baseball games. This post looks at the role machine learning plays in providing fans with deeper insights into the game. We also provide code snippets that show the training and deployment process behind these insights on Amazon SageMaker.


Optimal Testing in the Experiment-rich Regime

Schmit, Sven, Shah, Virag, Johari, Ramesh

arXiv.org Machine Learning

Motivated by the widespread adoption of large-scale A/B testing in industry, we propose a new experimentation framework for the setting where potential experiments are abundant (i.e., many hypotheses are available to test), and observations are costly; we refer to this as the experiment-rich regime. Such scenarios require the experimenter to internalize the opportunity cost of assigning a sample to a particular experiment. We fully characterize the optimal policy and give an algorithm to compute it. Furthermore, we develop a simple heuristic that also provides intuition for the optimal policy. We use simulations based on real data to compare both the optimal algorithm and the heuristic to other natural alternative experimental design frameworks. In particular, we discuss the paradox of power: high-powered classical tests can lead to highly inefficient sampling in the experiment-rich regime.


Bayesball: Bayesian analysis of batting average – Towards Data Science

#artificialintelligence

One of the topics in data science or statistics I found interesting, but having difficulty understanding is Bayesian analysis. During the course of my General Assembly's Data Science Immersive boot camp, I have had a chance to explore Bayesian statistics, but I really think I need some review and reinforcement. This is my personal endeavour to have a better understanding of Bayesian thinking, and how it can be applied to real-life cases. For this post, I am mainly inspired by a Youtube series by Rasmus Bååth, "Introduction to Bayesian data analysis". He is really good at giving you an intuitive understanding of Bayesian analysis, not by bombarding you with all the complicated formulas, but by providing you with a thought-process of Bayesian statistics. The topic I chose for this post is baseball.


Batting Order Setup in One Day International Cricket

Izadi, Masoumeh (Television Content Analytics) | Narula, Simranjeet (Television Content Analytics)

AAAI Conferences

In the professional sport of cricket, batting order assignment is of significant interest and importance to coaches, players, and fans as an influencing parameter on the game outcome. The impact of batting order on scoring runs is widely known and managers are often judged based on their perceived weakness or strength in setting the batting order. In practice, a combination of experts’ intuitions plus a few descriptive and sometimes conflicting performance statistics are used to assign an order to the batters in a team line-up before the games and in player replacement due to injuries during the games. In this paper, we propose the use of learning methods in automatic line-up order assignment based on several measures of performance and historical data. We discuss the importance of this problem in designing a winning strategy for cricket teams and the challenges this application introduces to the community and the currently existing approaches in AI.


Simulation of empirical Bayesian methods (using baseball statistics)

@machinelearnbot

We're approaching the end of this series on empirical Bayesian methods, and have touched on many statistical approaches for analyzing binomial (success / total) data, all with the goal of estimating the "true" batting average of each player. There's one question we haven't answered, though: do these methods actually work? Even if we assume each player has a "true" batting average as our model suggests, we don't know it, so we can't see if our methods estimated it accurately. For example, we think that empirical Bayes shrinkage gets closer to the true probabilities than raw batting averages do, but we can't actually measure the mean-squared error. This means we can't test our methods, or examine when they work well and when they don't.


Understanding beta binomial regression (using baseball statistics)

#artificialintelligence

In this series we've been using the empirical Bayes method to estimate batting averages of baseball players. Empirical Bayes is useful here because when we don't have a lot of information about a batter, they're "shrunken" towards the average across all players, as a natural consequence of the beta prior. When players are better, they are given more chances to bat! (Hat tip to Hadley Wickham to pointing this complication out to me). That means there's a relationship between the number of at-bats (AB) and the true batting average. For reasons I explain below, this makes our estimates systematically inaccurate.


Instant Replay: Investigating statistical Analysis in Sports

Sidhu, Gagan

arXiv.org Artificial Intelligence

Technology has had an unquestionable impact on the way people watch sports. Along with this technological evolution has come a higher standard to ensure a good viewing experience for the casual sports fan. It can be argued that the pervasion of statistical analysis in sports serves to satiate the fan's desire for detailed sports statistics. The goal of statistical analysis in sports is a simple one: to eliminate subjective analysis. In this paper, we review previous work that attempts to analyze various aspects in sports by using ideas from Markov Chains, Bayesian Inference and Markov Chain Monte Carlo (MCMC) methods. The unifying goal of these works is to achieve an accurate representation of the player's ability, the sport, or the environmental effects on the player's performance. With the prevalence of cheap computation, it is possible that using techniques in Artificial Intelligence could improve the result of statistical analysis in sport. This is best illustrated when evaluating football using Neuro Dynamic Programming, a Control Theory paradigm heavily based on theory in Stochastic processes. The results from this method suggest that statistical analysis in sports may benefit from using ideas from the area of Control Theory or Machine Learning